High-Performance Haplotype Assembly
نویسندگان
چکیده
The problem of Haplotype Assembly is an essential step in human genome analysis. It is typically formalised as the Minimum Error Correction (MEC) problem which is NP-hard. MEC has been approached using heuristics, integer linear programming, and fixedparameter tractability (FPT), including approaches whose runtime is exponential in the length of the DNA fragments obtained by the sequencing process. Technological improvements are currently increasing fragment length, which drastically elevates computational costs for such methods. We present pWhatsHap, a multi-core parallelisation of WhatsHap, a recent FPT optimal approach to MEC. WhatsHap moves complexity from fragment length to fragment overlap and is hence of particular interest when considering sequencing technology’s current trends. pWhatsHap further improves the efficiency in solving the MEC problem, as shown by experiments performed on datasets with high coverage.
منابع مشابه
Self-organizing map approaches for the haplotype assembly problem
Haplotype assembly is to reconstruct a pair of haplotypes from SNP values observed in a set of individual DNA fragments. In this paper, we focus on studying minimum error correction (MEC) model for the haplotype assembly problem and explore self-organizing map (SOM) methods for this problem. Specifically, haplotype assembly by MEC is formulated into an integer linear programming model. Since th...
متن کاملTumor Haplotype Assembly Algorithms for Cancer Genomics
The growing availability of inexpensive high-throughput sequence data is enabling researchers to sequence tumor populations within a single individual at high coverage. But, cancer genome sequence evolution and mutational phenomena like driver mutations and gene fusions are difficult to investigate without first reconstructing tumor haplotype sequences. Haplotype assembly of single individual t...
متن کاملHaplotype assembly in polyploid genomes and identical by descent shared tracts
MOTIVATION Genome-wide haplotype reconstruction from sequence data, or haplotype assembly, is at the center of major challenges in molecular biology and life sciences. For complex eukaryotic organisms like humans, the genome is vast and the population samples are growing so rapidly that algorithms processing high-throughput sequencing data must scale favorably in terms of both accuracy and comp...
متن کاملHapCompass: A Fast Cycle Basis Algorithm for Accurate Haplotype Assembly of Sequence Data
Genome assembly methods produce haplotype phase ambiguous assemblies due to limitations in current sequencing technologies. Determining the haplotype phase of an individual is computationally challenging and experimentally expensive. However, haplotype phase information is crucial in many bioinformatics workflows such as genetic association studies and genomic imputation. Current computational ...
متن کاملImproving the performance measurement using overall equipment effectiveness in an automotive industry
Considering the present business competitive scenario, the automotive industry is under pressure to achieve higher productivity. A high level of performance and quality standard could be achieved through improving the Overall Equipment Effectiveness (OEE) of the equipment in an automotive industry. Thus, the aim of this study is to investigate the performance measurement through OEE theory in a...
متن کاملHapCHAT: Adaptive haplotype assembly for efficiently leveraging high coverage in long reads
Motivation: Haplotype assembly is the process of reconstructing the haplotypes of an individual from sequencing reads. Computational methods for this problem have shown to achieve high accuracy on long reads, which are becoming cheaper to produce and more widely available. Larger amounts of data, usually originating from increased coverage, are highly beneficial for improving the quality of the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014